A Multi-Core Pipelined Architecture for Parallel Computing
نویسندگان
چکیده
Parallel programming on multi-core processors has become the industry’s biggest software challenge. This paper proposes a novel parallel architecture for executing sequential programs using multi-core pipelining based on program slicing by a new memory/cache dynamic management technology. The new architecture is very suitable for processing large geospatial data in parallel without parallel programming. This paper presents a new architecture for parallel computation that addresses the problem of requiring to relocate data from one memory hierarchy to another in a multi-core environment. A new memory management technology inserts a layer of abstraction between the processor and the memory hierarchy, allowing the data to stay in one place while the processor effectively migrates as tasks change. The new architecture can make full use of the pipeline and automatically partition data then schedule them onto multi-cores through the pipeline. The most important advantage of this architecture is that most existing sequential programs can be directly used with nearly no change, unlike conventional parallel programming which has to take into account scheduling, load balancing, and data distribution. The new parallel architecture can also be successfully applied to other multi-core/many-core architectures or heterogeneous systems. In this paper, the design of the new multi-core architecture is described in detail. The time complexity and performance analysis are discussed in depth. The experimental results and performance comparison with existing multi-core architectures demonstrate the effectiveness, flexibility, and diversity of the new architecture, in particular, for Big Data parallel processing. KeywordsMulti-Core Architecture; Pipelining; Sequential Programs; Program Slicing; Crossbar Switching; Parallel Computing; Big Data
منابع مشابه
Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملMulti-objective exploitation of pipeline parallelism using clustering, replication and duplication in embedded multi-core systems
With the popularity of mobile device, people require more computing power to run emerging applications. However, the increase in power consumption is a major problem because power is quite limited in embedded systems. Our goal is to consider power consumption along with latency and throughput. We proposed a heuristic algorithm, called Parallel Pipeline Latency Optimization for high performance ...
متن کاملHEVC Hardware Decoder Implementation for UHD Video Applications
In this paper, an efficient hardware architecture that exploits parallel processing for HEVC decoders is proposed by introducing (i) a Coding Tree Unit (CTU)-level pipelined architecture for single-core based processing; and (ii) a multi-core based parallel processing architecture for picture partition decoding with low latency while not requiring additional resources for in-loop filtering (ILF...
متن کاملNew debugging concept for symmetric multiprocessing (SMP)
However, for the parallelization of tasks not necessarily a multi-core processor is required. Hardware multithreading, for example, is an approach that enables parallelization also for single-core processors. Here, we deal with a basic problem of cores with pipeline architecture: cache misses or data dependencies between the instructions mean that the pipelined instruction processing has to be ...
متن کاملEfficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014